Search CORE

182 research outputs found

Optimal Multistage Algorithm for Adjoint Computation

Author: Aupy Guillaume
Herrmann Julien
Hovland Paul
Robert Yves
Publication venue: 'Society for Industrial & Applied Mathematics (SIAM)'
Publication date: 01/01/2016
Field of study

International audienceWe reexamine the work of Stumm and Walther on multistage algorithms for adjoint computation. We provide an optimal algorithm for this problem when there are two levels of checkpoints , in memory and on disk. Previously, optimal algorithms for adjoint computations were known only for a single level of checkpoints with no writing and reading costs; a well-known example is the binomial checkpointing algorithm of Griewank and Walther. Stumm and Walther extended that binomial checkpointing algorithm to the case of two levels of checkpoints, but they did not provide any optimality results. We bridge the gap by designing the first optimal algorithm in this context. We experimentally compare our optimal algorithm with that of Stumm and Walther to assess the difference in performance

HAL-ENS-LYON

INRIA a CCSD electronic archive server

Hal-Diderot

Towards Rapid Robotic Fibre placement with in-situ ultra-violet curing

Author: Compston Paul
Di Pietro Adriano
Gresham Robert
Hovland Geir
Publication venue: 'RMIT Publishing'
Publication date: 08/12/2015
Field of study

The Australian National University

Enhancing Virtual Distillation with Circuit Cutting for Quantum Error Mitigation

Author: Hovland Paul
Li Peiyi
Liu Ji
Patil Hrushikesh Pramod
Zhou Huiyang
Publication venue
Publication date: 09/10/2023
Field of study

Virtual distillation is a technique that aims to mitigate errors in noisy quantum computers. It works by preparing multiple copies of a noisy quantum state, bridging them through a circuit, and conducting measurements. As the number of copies increases, this process allows for the estimation of the expectation value with respect to a state that approaches the ideal pure state rapidly. However, virtual distillation faces a challenge in realistic scenarios: preparing multiple copies of a quantum state and bridging them through a circuit in a noisy quantum computer will significantly increase the circuit size and introduce excessive noise, which will degrade the performance of virtual distillation. To overcome this challenge, we propose an error mitigation strategy that uses circuit-cutting technology to cut the entire circuit into fragments. With this approach, the fragments responsible for generating the noisy quantum state can be executed on a noisy quantum device, while the remaining fragments are efficiently simulated on a noiseless classical simulator. By running each fragment circuit separately on quantum and classical devices and recombining their results, we can reduce the noise accumulation and enhance the effectiveness of the virtual distillation technique. Our strategy has good scalability in terms of both runtime and computational resources. We demonstrate our strategy's effectiveness through noisy simulation and experiments on a real quantum device.Comment: 8 pages, 5 figure

arXiv.org e-Print Archive

Model Checking Race-freedom When "Sequential Consistency for Data-race-free Programs" is Guaranteed

Author: Hovland Paul D.
Hückelheim Jan
Luo Ziqing
Siegel Stephen F.
Wu Wenhao
Publication venue
Publication date: 29/05/2023
Field of study

Many parallel programming models guarantee that if all sequentially consistent (SC) executions of a program are free of data races, then all executions of the program will appear to be sequentially consistent. This greatly simplifies reasoning about the program, but leaves open the question of how to verify that all SC executions are race-free. In this paper, we show that with a few simple modifications, model checking can be an effective tool for verifying race-freedom. We explore this technique on a suite of C programs parallelized with OpenMP

arXiv.org e-Print Archive

Automatic Differentiation for Adjoint Stencil Loops

Author: Gorman Gerard
Hovland Paul
Hückelheim Jan
Kukreja Navjot
Luporini Fabio
Narayanan Sri Hari Krishna
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 05/07/2019
Field of study

Stencil loops are a common motif in computations including convolutional neural networks, structured-mesh solvers for partial differential equations, and image processing. Stencil loops are easy to parallelise, and their fast execution is aided by compilers, libraries, and domain-specific languages. Reverse-mode automatic differentiation, also known as algorithmic differentiation, autodiff, adjoint differentiation, or back-propagation, is sometimes used to obtain gradients of programs that contain stencil loops. Unfortunately, conventional automatic differentiation results in a memory access pattern that is not stencil-like and not easily parallelisable. In this paper we present a novel combination of automatic differentiation and loop transformations that preserves the structure and memory access pattern of stencil loops, while computing fully consistent derivatives. The generated loops can be parallelised and optimised for performance in the same way and using the same tools as the original computation. We have implemented this new technique in the Python tool PerforAD, which we release with this paper along with test cases derived from seismic imaging and computational fluid dynamics applications.Comment: ICPP 201

arXiv.org e-Print Archive

Crossref

Efficient precision simulation of processes with many-jet final states at the LHC

Author: Bothmann Enrico
Childers Taylor
Guetschow Christian
Hovland Paul
Höche Stefan
Isaacsson Joshua
Knobbe Max
Latham Robert
Publication venue
Publication date: 22/09/2023
Field of study

We present a scalable technique for the simulation of collider events with multi-jet final states, based on an improved parton-level event file format. The method is implemented for both leading- and next-to-leading order QCD calculations. We perform a comprehensive analysis of the I/O performance and validate our new framework using Higgs-boson plus multi-jet production with up to seven jets. We make the resulting code base available for public use.Comment: 14 pages, 7 figures, 2 table

arXiv.org e-Print Archive

QContext: Context-Aware Decomposition for Quantum Gates

Author: Bowman Max
Chong Frederic T.
Dangwal Siddharth
Gokhale Pranav
Hovland Paul D.
Larson Jeffrey
Liu Ji
Publication venue
Publication date: 03/02/2023
Field of study

In this paper we propose QContext, a new compiler structure that incorporates context-aware and topology-aware decompositions. Because of circuit equivalence rules and resynthesis, variants of a gate-decomposition template may exist. QContext exploits the circuit information and the hardware topology to select the gate variant that increases circuit optimization opportunities. We study the basis-gate-level context-aware decomposition for Toffoli gates and the native-gate-level context-aware decomposition for CNOT gates. Our experiments show that QContext reduces the number of gates as compared with the state-of-the-art approach, Orchestrated Trios.Comment: 10 page

arXiv.org e-Print Archive

ytopt: Autotuning Scientific Applications for Energy Efficiency at Large Scales

Author: Balaprakash Prasanna
Geltz Brad
Hall Mary
Hovland Paul
Jana Siddhartha
Koo Jaehoon
Kruse Michael
Taylor Valerie
Videau Brice
Wu Xingfu
Publication venue
Publication date: 28/03/2023
Field of study

As we enter the exascale computing era, efficiently utilizing power and optimizing the performance of scientific applications under power and energy constraints has become critical and challenging. We propose a low-overhead autotuning framework to autotune performance and energy for various hybrid MPI/OpenMP scientific applications at large scales and to explore the tradeoffs between application runtime and power/energy for energy efficient application execution, then use this framework to autotune four ECP proxy applications -- XSBench, AMG, SWFFT, and SW4lite. Our approach uses Bayesian optimization with a Random Forest surrogate model to effectively search parameter spaces with up to 6 million different configurations on two large-scale production systems, Theta at Argonne National Laboratory and Summit at Oak Ridge National Laboratory. The experimental results show that our autotuning framework at large scales has low overhead and achieves good scalability. Using the proposed autotuning framework to identify the best configurations, we achieve up to 91.59% performance improvement, up to 21.2% energy savings, and up to 37.84% EDP improvement on up to 4,096 nodes

arXiv.org e-Print Archive

Reverse-mode algorithmic differentiation of an OpenMP-parallel compressible flow solver

Author: Blazek J
Förster M
Giles MB
Jan Hückelheim
Jens-Dominik Müller
Michelle Mills Strout
Naumann U
Paul Hovland
Spalart P
Publication venue: 'SAGE Publications'
Publication date: 03/05/2017
Field of study

Crossref

Queen Mary Research Online